Reward-Modulated Hebbian Learning of Decision Making

نویسندگان

  • Michael Pfeiffer
  • Bernhard Nessler
  • Rodney J. Douglas
  • Wolfgang Maass
چکیده

We introduce a framework for decision making in which the learning of decision making is reduced to its simplest and biologically most plausible form: Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule as reinforcement learning in which certain decisions are rewarded and prove that each synaptic weight will on average converge exponentially fast to the log-odd of receiving a reward when its pre- and postsynaptic neurons are active. In our simple architecture, a particular action is selected from the set of candidate actions by a winner-take-all operation. The global reward assigned to this action then modulates the update of each synapse. Apart from this global reward signal, our reward-modulated Bayesian Hebb rule is a pure Hebb update that depends only on the coactivation of the pre- and postsynaptic neurons, not on the weighted sum of all presynaptic inputs to the postsynaptic neuron as in the perceptron learning rule or the Rescorla-Wagner rule. This simple approach to action-selection learning requires that information about sensory inputs be presented to the Bayesian decision stage in a suitably preprocessed form resulting from other adaptive processes (acting on a larger timescale) that detect salient dependencies among input features. Hence our proposed framework for fast learning of decisions also provides interesting new hypotheses regarding neural nodes and computational goals of cortical areas that provide input to the final decision stage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Biologically Plausible 3-factor Learning Rule for Expectation Maximization in Reinforcement Learning and Decision Making

One of the most frequent problems in both decision making and reinforcement learning (RL) is expectation maximization involving functionals such as reward or utility. Generally, these problems consist of computing the optimal solution of a density function. Instead of trying to find this exact solution, a common approach is to approximate it through a learning process. In this work we propose a...

متن کامل

Hebbian learning for deciding optimally among many alternatives (almost)

Reward-maximizing performance and neurally plausible mechanisms for achieving it have been completely characterized for a general class of two-alternative decision making tasks, and data suggest that humans can implement the optimal procedure. A greater number of alternatives complicates the analysis, but here too, analytical approximations to optimality that are physically and psychologically ...

متن کامل

Hebbian learning in linear-nonlinear networks with tuning curves leads to near-optimal, multi-alternative decision making

Optimal performance and physically plausible mechanisms for achieving it have been completely characterized for a general class of two-alternative, free response decision making tasks, and data suggest that humans can implement the optimal procedure. The situation is more complicated when the number of alternatives is greater than two and subjects are free to respond at any time, partly due to ...

متن کامل

Hebbian Learning of Bayes Optimal Decisions

Uncertainty is omnipresent when we perceive or interact with our environment, and the Bayesian framework provides computational methods for dealing with it. Mathematical models for Bayesian decision making typically require datastructures that are hard to implement in neural networks. This article shows that even the simplest and experimentally best supported type of synaptic plasticity, Hebbia...

متن کامل

Multi-layer network utilizing rewarded spike time dependent plasticity to learn a foraging task

Neural networks with a single plastic layer employing reward modulated spike time dependent plasticity (STDP) are capable of learning simple foraging tasks. Here we demonstrate advanced pattern discrimination and continuous learning in a network of spiking neurons with multiple plastic layers. The network utilized both reward modulated and non-reward modulated STDP and implemented multiple mech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural computation

دوره 22 6  شماره 

صفحات  -

تاریخ انتشار 2010